MCMC Sampling of partially-observed multivariate normal nodes #1612

danielturek · 2026-01-26T14:57:20Z

Addresses NCT issue 296.

This PR adds support for MCMC sampling of partially-observed multivariate normal nodes. That is, multivariate-distributed nodes, where some of the dimensions are observed data, and some of the dimensions are unobserved (latent). The partial_mvn sampler essentially operates by assigning univariate RW samplers to the (individual) unobserved dimensions of partially-observed (and therefore also partially-unobserved) MVN-distributed nodes.

More specifically, several cases for sampling the unobserved dimensions are handled:

Any latent dimensions which have no stochastic dependencies are assigned posterior_predictive samplers.
If multivariateNodesAsScalars = TRUE, then all remaining unobserved dimensions are assigned univariate RW samplers.
If multivariateNodesAsScalars = FALSE, then all remaining unobserved dimensions are assigned a single multivariate RW_block.

A variety of test cases are also added.

`partial_mvn` sampler, to have the code here for posterity.

danielturek · 2026-01-26T21:30:03Z

Testing passed. I welcome any thoughts or review from @paciorek or @perrydv

paciorek · 2026-02-10T02:27:27Z

So do we think the most common use case would be that y[1:n] is a leaf node containing some data and some non-data elements and therefore the non-data are predictive nodes? In that case it looks like we assign a RW_block sampler. So @danielturek I'm not sure what you meant by "Any latent dimensions which have no stochastic dependencies are assigned posterior_predictive samplers." More substantively, should we work out the true MVN conditional distribution in that situation and sample from the true predictive? If we do think this is a common case, then I think it might be worth it. (And I would be happy to lend a hand if we decide to do it.)

danielturek · 2026-02-11T16:24:01Z

@paciorek Please see the addition of the partial_mvn_pp sampler ("partial MVN posterior predictive" sampler), which is used internally in the partial_mvn sampler.

When one or more dimensions (from among the unobserved dimensions) of a partially-observed MVN node themselves have no data dependencies, then a partial_mvn_pp sampler is now assigned to these dimension(s) - instead of either RW or RW_block samplers, as before. This new partial_mvn_pp generates draws directly from the conditional MVN distribution.

@paciorek A second set of eyes would be appreciated. There might also be some efficiency improvements with the linear algebra within the partial_mvn_pp sampler, which could be improved upon from the present implementation.

paciorek

thanks, @danielturek . I took a reasonably careful look though I didn't look too carefully at the tests. In addition to my line-specific comments (some of them niggly - sorry!):

We should discuss whether to cache the pieces of the conditional mvn, in particular choleskies if we know that the covariance is fixed. We could leave this for a future improvement, though if you have momentum on all this, perhaps we do it now?
I'd like us to check if partial_mvn_pp is correctly assigned, so how about we build the MCMC in some of the tests and check the internal sampler assignments?
I'm not sure about whether we should hide this behind an option. Perhaps it's fine if we don't, though if we add the caching and are less certain about that, then maybe we should.
Can we check MCMCuseBarkerAsDefaultMV and use Barker instead of sampler_RW_block if it is on?
Also I think we should have a test that checks correctness of the partial_mvn_pp sampler in a basic case.

I'm happy to help with any of this if you like.

packages/nimble/R/MCMC_samplers.R

danielturek · 2026-02-11T21:37:28Z

@paciorek Thanks for the careful review.

Repeating the request from above, my first preference would be to make the backsolve and forwardsolve changes myself. But that would take a bit more time for me to understand and make sure I do it right, and I'm out of time for today. If you are willing to make this change yourself, and leave the original (current) code there commented out, I would very much appreciate that, and I'll take a very careful look later.

I'd be happy to discuss strategies for "caching" the components of the covariance matrix. But I think it would have to be done with some thought and some caution. Even if the covariance node has no parent nodes (stochastic or otherwise), the contents of the node can still be modified (regardless if the value is provided by constants, data, or initial values). So I can imagine a few ways we might try to do this, for example:

"If the covariance node has no parent nodes, then we have a indicator internal to the sampler (say firstRun) which indicates if this is the first run of the MCMC (initially TRUE), and when firstRun is TRUE, we pull the value of the covariance from the model, and perform the factorizations etc, and cache those values in the sampler. Then, on subsequent runs of the sampler, when firstRun is FALSE, we just used the cached values. However, when the reset method is called, indicating starting a new run of the MCMC, we set firstRun back to TRUE, in case the user changed the value of the covariance (inside the model) in between MCMC runs. (and, we also take it on blind faith that a user didn't change the value of the covariance, then continue an MCMC run using reset = FALSE, which would make the entire MCMC a little questionable, and make the subsequent MCMC results incorrect in some sense)."

This is a tricky one, unless you have some better ideas that I'm not imagining.

In the case of assigning the barker sampler in place of the RW_block sampler, need we do anything to enable derivatives in the model?

@paciorek Yes, please go ahead and make / add any tests for any of the following:

any testing with changing the nimble option MCMCuseBarkerAsDefaultMV to TRUE, and correct assignment and operation of the barker sampler.
check if partial_mvn_pp is correctly assigned, build the MCMC in some of the tests and check the internal sampler assignments
test that checks correctness of the partial_mvn_pp sampler in a basic case.

…al_mvn

paciorek · 2026-02-11T21:57:43Z

Ok I pushed a commit that has the implementation of the Cholesky of Sigma22. I didn't run tests (though CI will of course) or check correctness yet.

As far as Barker, yes, that's correct the model would need to have derivs built. But I don't think anything is different with using Barker here than in our default MCMC config if that Barker option is set.

Regarding caching, I will comment on that in a moment below.

I will try to work on some of the other things this afternoon as well. Or tomorrow.

Unless Frederic indicates urgency in response to my email I am now planning that we include this in 1.4.1 and am thinking I will push this to CRAN on Tuesday. I will be out of town over the long weekend.

paciorek · 2026-02-11T22:06:49Z

Regarding caching, I think the case that we can safely use cached values is if the covariance is RHS-only. If it has no parent nodes it could still be being sampled in an MCMC (e.g., if it is dwishart or dlkj with fixed prior parameters).

Two other thoughts:

we could do similar machinations with the mean so as to save recalculation of the conditional mean, which involves matrix-vector manipulations. Clearly less important but if we are working out the caching perhaps we might as well.
we could in cases where we don't know if the values might have changed, actually check if the cached covariance (and perhaps mean) are the same as the current values from getParam. Then if they are, we can avoid some calculations. Checking should be O(n^2), while the saved calculations are O(n^3) (dominated by the Cholesky of either Sigma11 or Sigma22). But in cases where the covariance is changing every iteration then we are doing a bunch of useless checking. So I think it probably doesn't make sense to do this.

I guess if my reasoning about checking for RHS-only is correct, I propose we combine that with your sketch of using firstRun. And I think we might as well check if the mean is RHS-only too.

paciorek · 2026-02-11T22:53:33Z

I'm working on some additional tests.

@danielturek I don't understand the logic in "Values change at same rows for unobserved parts of partially observed node when multivariateNodesAsScalars = FALSE" test. Based on the model is uses the _pp sampler, so multivariateNodesAsScalars is not involved. Don't you want a model where y is intermediate?

danielturek · 2026-02-12T01:49:33Z

@paciorek That test predated the addition of the partial_mvn_pp sampler - meaning, that test existed before today. The idea behind that test was to confirm that all unobserved dimensions of the MVN node change in unison (or, none of them change) on each MCMC iteration, since when multivariateNodesAsScalars = FALSE, the unobserved dimensions should have a RW_block sampler assigned (and hence all change simultaneously, or all remain unchanged, on each MCMC iteration). This is of course now moot, since for the model in this test the partial_mvn_pp now updates all unobserved dimensions on every MCMC iteration, so they will all (always) change on each iteration. I'm not wedded to this test, there's more of a backstory here than I'll get into, and I would be happy to have this test be removed, or modified, if you're so inclined.

danielturek · 2026-02-12T01:56:27Z

@paciorek I worked through the linear algebra. I imagine it's simple to you, but this was helpful for me, and it was nice to see especially how the forwardsolve can be used to calculate the conditional posterior for Sigma, leveraging the transpose-relationship between Sigma12 and Sigma21. I now have a much better understanding of the uses of forwardsolve and backsolve. I also fixed an extremely minor dimension mistake.

paciorek · 2026-02-12T02:12:48Z

Ok, so my new code had some issues with dimensions that I just fixed. Results are now the same as Daniel's implementation.

I added some tests, but the one I still want to add is one to check correctness of the _pp samples. I'll do that tomorrow morning.

I'll let you decide what to do about the 'moot' test.

paciorek · 2026-02-12T02:14:06Z

I think we stepped on each other's commits a bit, but I think everything is ok now.

danielturek · 2026-02-12T10:35:45Z

@paciorek expect_success does not operate as one might imagine. Use expect_no_error instead.

danielturek and others added 5 commits September 24, 2024 15:34

added getMixedDataNodeNames method for models

c331871

removed comments from NEWS

a56dfd4

partial_mvn (#1543)

1bf77cb

`partial_mvn` sampler, to have the code here for posterity.

resovled merge conflicts

dd5dc7c

fixed unlist for graph node ids

536c62f

danielturek added 6 commits February 10, 2026 15:08

updates

9a40789

updates

1b34ab7

updates

7379099

updates

c53f181

Merge branch 'devel' into partial_mvn

08d1058

Merge branch 'partial_mvn2' into partial_mvn

f95c927

paciorek requested changes Feb 11, 2026

View reviewed changes

danielturek added 3 commits February 11, 2026 15:41

removed extraneous tests of partial_mvn sampler

8465364

updates per comments on PR

8447420

partial_mvn sampler can assign barker sampler

2a4f581

paciorek added 2 commits February 11, 2026 13:43

Use Cholesky of Sigma22 in partial_mvn_pp.

bc50d56

Merge branch 'partial_mvn' of github.com:nimble-dev/nimble into parti…

26d68f8

…al_mvn

fixed dimension mistake

b5e1fa8

paciorek added 3 commits February 11, 2026 17:57

Use Cholesky in sampler_partial_mvn_pp.

10c8603

Add more testing for sampler_partial_mvn.

a2ffffb

Fix silly merge conflict.

b257249

fix spacing again

74d0e65

remove extra tests, and fix use of expect_success

302d7b6

danielturek requested a review from paciorek February 12, 2026 10:36

MCMC Sampling of partially-observed multivariate normal nodes #1612

Are you sure you want to change the base?

MCMC Sampling of partially-observed multivariate normal nodes #1612

Uh oh!

Conversation

danielturek commented Jan 26, 2026

Uh oh!

danielturek commented Jan 26, 2026

Uh oh!

paciorek commented Feb 10, 2026

Uh oh!

danielturek commented Feb 11, 2026

Uh oh!

paciorek left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danielturek commented Feb 11, 2026

Uh oh!

paciorek commented Feb 11, 2026

Uh oh!

paciorek commented Feb 11, 2026

Uh oh!

paciorek commented Feb 11, 2026

Uh oh!

danielturek commented Feb 12, 2026

Uh oh!

danielturek commented Feb 12, 2026

Uh oh!

paciorek commented Feb 12, 2026

Uh oh!

paciorek commented Feb 12, 2026

Uh oh!

danielturek commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

paciorek left a comment •

edited

Loading